Overview of LASSO-ShCu Data for Southern Great Plains

The LASSO Shallow-Convection scenario focuses on shallow convective clouds at the Southern Great Plains (SGP) atmospheric obesrvatory in Oklahoma. Ensembles of idealized large-eddy simulation (LES) runs are available for 95 case dates spanning the years 2015-2019.

Useful links for more information:

Author: William.Gustafson@pnnl.gov
Date: 7-May-2024

# Libraries required for this tutorial...

from datetime import datetime, timedelta
import numpy as np
import pandas as pd
import os
import xarray as xr

import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.gridspec as gridspec

import plotly.graph_objects as go

Avaialble LASSO-Shcu Datastreams

LASSO-ShCu consists of a suite of datastreams that combine a library of LES simulations with ARM observations to put the simulations in the context of reality for each case date. The following datastreams can be downloaded from the :

  1. sgplassodiagconfobsmod: Config-Obs-Model

    • Input data necessary to reproduce a simulation

    • Skill score information and plots summarizing model results for each simulation in the case date’s ensemble, typically hourly resolution

  2. sgplassodiagraw: Raw Model Output

    • wrfout hourly files containing instantaenous model snapshots every 10 minutes

    • wrfstat files containing LES statistics every 10 minutes, e.g., output-period-averaged domain-mean profiles

  3. sgplassocogsdiagobsmod: Clouds Optically Gridded by Stereo (COGS)-based skill analysis; when avaialble (only 2018-2019), this is considered more reliable than the ARSCL-based skill-scores for cloud fraction in sgplassodiagconfobsmod

    • 1-minute and 10-minute sampled cloud fraction obserations using COGS

    • Overview cloud-fraction skill-score values and plots for the simulation ensemble for the selected case date

    • Simulation-specific cloud-fraction skill-score information for each simulation on the selected case date

  4. sgplassohighfreqobs: High-frequency observations that were used to build sgplassodiagconfobsmod

    • sgp915rwpwindcon10mC1.c1: Wind profiles retrieved from radar-wind profilers at C1, I8, I9, and I10, 10-minute intervals

    • sgpcldfracset*C1.c1: KAZR-ARSCL derived time-height cloud fraction profiles for 1, 5, and 15-minute intervals

    • sgplassoblthermoC1.c1: Boundary-layer thermodynamic information for 500-700 m above surface from AERIoe and Raman lidar retrievals; temperature, water vapor mixing ratio, relative humidity, and pressure on 10-minute interval

    • sgplassodlcbhshcu*.c1: cloud-base height from Doppler lidar at stations C1, E32, E37, E39, and E41

    • sgplassolwpC1.c1: Liquid water path retrieved from AERI and MWRRET

    • sgplclC1.c1: Lifting condensation level for meteorology stations in the Oklahoma region

Downloading and Organizing the Datastreams

All of the above datastreams can be downloaded from the LASSO-ShCu bundle Browser. The table at the bottom of that webpage updates based on selections made in the checkboxes along the lefthand side of the page. Then, one selects which simulations and associated files are desired within the table. When all selections have been made, the Order Data button is at the bottom of the page. Users will have the most reliable download experience by also turning on the Globus option within Download Options. Most of the datastreams are managemable, with the exception of sgplassodiagraw, which can be a bit large for some users to work with–that one can crash FTP downloads via html and Globus gets around this issue.

Some datastreams contain files just for one simulation, while others have multiple simulation worth of information. So, the folder structures within the datastreams varies. An easy way to organize the data is in a tree structure by case date and simulation ID. This can be done by placing downloaded tar files into a single folder and then running support_code/stage_lasso_shcu_data.py that is in this tutorial’s sub-folder.

The remainder of this notebook assumes one is working with data organized using stage_lasso_shcu_data.py.

Raw WRF Output from LASSO-ShCu

The core of LASSO-ShCu is raw simulation output from the Weather Research and Forecasting (WRF) model, which has been run in an idealized LES mode with doubly periodic boundaries. The traditional wrfout file from WRF is provided to the users as-is. The bulk of the variables within the file are as one would expect from WRF. Additionaly, some LES-specific information is output, such as the forcing tendencies. LES modelers frequently work with summary statistics instead of cell-by-cell data, so we also output a wrfstat file that provides 10-minute averages for various metorology variables and diagnostics. Diagnostics include variables like cloud and ice water paths, domain-averaged profiles, fluxes, and in-cloud statistics.

Let’s take a look at a wrfstat file first since these are smaller and easier to work with.

# Plotting wrfstat variables...

path_shcu_root = "/gpfs/wolf2/arm/atm124/world-shared/arm-summer-school-2024/lasso_tutorial/ShCu/untar/"

case_date = datetime(2019, 4, 4)
sim_id = 4

ds_stat = xr.open_dataset(f"{path_shcu_root}/{case_date:%Y%m%d}/sim{sim_id:04d}/raw_model/wrfstat_d01_{case_date:%Y-%m-%d_12:00:00}.nc")
ds_stat
<xarray.Dataset> Size: 72GB
Dimensions:         (Time: 91, bottom_top: 226, bottom_top_stag: 227,
                     south_north: 250, west_east: 250, west_east_stag: 251,
                     south_north_stag: 251)
Coordinates:
    XTIME           (Time) datetime64[ns] 728B ...
Dimensions without coordinates: Time, bottom_top, bottom_top_stag, south_north,
                                west_east, west_east_stag, south_north_stag
Data variables: (12/179)
    Times           (Time) |S19 2kB ...
    CST_CLDLOW      (Time) float32 364B ...
    CST_CLDTOT      (Time) float32 364B ...
    CST_LWP         (Time) float32 364B ...
    CST_IWP         (Time) float32 364B ...
    CST_PRECW       (Time) float32 364B ...
    ...              ...
    CSV_IWC         (Time, bottom_top, south_north, west_east) float32 5GB ...
    CSV_CLDFRAC     (Time, bottom_top, south_north, west_east) float32 5GB ...
    CSS_LWP         (Time, south_north, west_east) float32 23MB ...
    CSS_IWP         (Time, south_north, west_east) float32 23MB ...
    CSS_CLDTOT      (Time, south_north, west_east) float32 23MB ...
    CSS_CLDLOW      (Time, south_north, west_east) float32 23MB ...
Attributes: (12/96)
    TITLE:                                  OUTPUT FROM WRF V3.8.1 MODEL
    START_DATE:                            2019-04-04_12:00:00
    WEST-EAST_GRID_DIMENSION:              251
    SOUTH-NORTH_GRID_DIMENSION:            251
    BOTTOM-TOP_GRID_DIMENSION:             227
    DX:                                    100.0
    ...                                    ...
    config_aerosol:                        NA
    config_forecast_time:                  15.0 h
    config_boundary_method:                Periodic
    config_microphysics:                   Thompson (mp_physics=8)
    config_nickname:                       runlas20190404v1addhm
    simulation_origin_host:                cumulus-login2.ccs.ornl.gov

Notice that there are several categories of variable names:

  1. CST are time series wher all the spatial dimensions have been collapsed via averaging or vertical integration. An example is CST_LWP for domain-average liquid water path.

  2. CSP are time series of profiles. The X-Y dimensions have been collapsed via averaging but vertical information is retained. The variable related to CST_LWPis CSP_LWC for the domain-averaged liquid water content profile.

  3. CSS are time series of X-Y slices. Continuing on the theme of quantifying the condensate, CSS_LWP is the liquid water path with X-Y information retained.

  4. CSV are full-volume variables with X-Y-Z-T dimensions. A condensate example would be CSV_QR for the rainwater mixing ratio. This is similar to the QRAIN variable output normally by WRF. However, the variables in wrfstat are averaged between output times, where as variables in the wrfout files are instantaneous.

Plotting wrfstat data is straightforward since all output from a run are included in one file.

# By default, xarray does not interpret the wrfout/wrfstat time information in a way that attaches 
# it to each variable. Here is at trick to map the time held in XTIME with the Time coordinate 
# associated with each variable.
ds_stat["Time"] = ds_stat["XTIME"]

# After fixing the time coordinate, we can use xarray's plotting features to get time-labeled plots.

hour_to_plot = 17

ds_stat["CST_LWP"].plot()  # time series
plt.show()

ds_stat["CSP_LWC"].sel(Time=f"{case_date:%Y-%m-%d} {hour_to_plot}:00").plot()  # profile at a selected time (plots sideways though)
plt.show()

ds_stat["CSS_LWP"].sel(Time=f"{case_date:%Y-%m-%d} {hour_to_plot}:00").plot()  # X-Y slice for a selected time
plt.show()

ds_stat["CSV_LWC"].sel(Time=f"{case_date:%Y-%m-%d} {hour_to_plot}:00", south_north=1).plot()  # A vertical slice from the volume at a selected time
plt.show()
../../_images/cc8064d6d0a56fa37cdf508a2ed612e35965796b8437d226c06f6579d9a8b94c.png ../../_images/ed1e46d591f55dec634bb4c87b8bda76ce969d75667f1da246a01b73bcd63d8b.png ../../_images/64ffccce58a365c35587b635de4ba90dd89ac7ae0956c4658ab41305bbbe7c29.png ../../_images/3d29d3442c45fd885ed214ff4e7a77a4b7f764fa6736837d940575f40b3975d7.png
# Here is a 3-D volume rendering of the cloud liquid water content. We are only plotting 
# one quadrant of the data to keep the memory and compute usage down. 

# Get the domain size and set the limitations on plot size...
nt, nz, ny, nx = ds_stat["CSV_LWC"].shape
nx_use = int(nx/3)
ny_use = int(ny/3)
nz_use = int(nz/4)

# Define the coordinates as a volume...
Z, Y, X = np.meshgrid(ds_stat["CSP_Z"].sel(Time=f"{case_date:%Y-%m-%d} {hour_to_plot}:00").isel(bottom_top=slice(0,nz_use)).values*1e-3, 
                      np.arange(0,ny_use)*0.1, 
                      np.arange(0,nx_use)*0.1,
                      indexing='ij')

# Pull the data to plot...
plot_data = ds_stat["CSV_LWC"].\
    sel(Time=f"{case_date:%Y-%m-%d} {hour_to_plot}:00").\
    isel(west_east=slice(0,nx_use), south_north=slice(0,ny_use), bottom_top=slice(0,nz_use)).\
    values

# Now, make the plot...
fig = go.Figure(data=go.Isosurface(
    x=X.flatten(), y=Y.flatten(), z=Z.flatten(),
    value=plot_data.flatten(),
    isomin=0.0001, isomax=0.001,
    colorscale='temps'
))
fig.show()

COGS Cloud Fraction

todo: section in development…

The Cloud Optically Gridded by Stereo (COGS) data set is very good for cloud fractions <0.5. How does the the COGS cloud fraction compare to this simulation?

Let’s open one of the COGS obs-mod files to see how the hourly data is stored in these files. They contain both the observation and model values on aligned coordinates. Note the “source_type” coordinate’s attributes.

  • source_type == 0 is the processed observations

  • source_type == 1 is the processed model output

# We will use the processed, hourly cloud fraction data from the sgplassocogsdiagobsmod datastream.
ds_cogs = xr.open_dataset(f"{path_shcu_root}/{case_date:%Y%m%d}/sim{sim_id:04d}/obs_model/sgplassocogsdiagobsmod{sim_id}C1.m1.{case_date:%Y%m%d}.120000.nc")
ds_cogs
<xarray.Dataset> Size: 1kB
Dimensions:                               (time: 16, bound: 2, source_type: 2)
Coordinates:
  * time                                  (time) datetime64[ns] 128B 2019-04-...
  * source_type                           (source_type) int32 8B 0 1
Dimensions without coordinates: bound
Data variables:
    base_time                             datetime64[ns] 8B ...
    time_offset                           (time) datetime64[ns] 128B ...
    time_bounds                           (time, bound) datetime64[ns] 256B ...
    low_cloud_fraction_cogs               (time, source_type) float32 128B ...
    qc_low_cloud_fraction_cogs            (time, source_type) int32 128B ...
    low_cloud_fraction_cogs_goodfraction  (time, source_type) float32 128B ...
    low_cloud_fraction_cogs_std           (time, source_type) float32 128B ...
    lat                                   float32 4B ...
    lon                                   float32 4B ...
    alt                                   float32 4B ...
Attributes: (12/34)
    command_line:                          Not applicable
    Conventions:                           ARM-1.3
    process_version:                       Not applicable
    dod_version:                           lassocogsdiagobsmod4-m1-1.0
    input_datastreams:                     sgplassodiagmod4C1.m1 : 1.1 : 2019...
    site_id:                               sgp
    ...                                    ...
    config_forecast_time:                  15.0 h
    config_boundary_method:                Periodic
    config_microphysics:                   Thompson (mp_physics=8)
    config_nickname:                       runlas20190404v1addhm
    doi:                                   10.5439/1673163
    history:                               created by user ttoto on machine a...

Plotting these side-by-side is now easy since everything is aligned and sampled similarly.

ds_cogs["low_cloud_fraction_cogs"].isel(source_type=0).plot(label="COGS")
ds_cogs["low_cloud_fraction_cogs"].isel(source_type=1).plot(label="WRF")
../../_images/3f8fe2f6b4800a7abbdd0c63c3d9a3d0335118be9f3cf51cf02b2d9baa98f42f.png